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Abstract 



Respondent driven sampling (RDS) is an approach to sampling design utilizing the net- 
works of social relationships that connect members of the target population to facilitate 
sampling by chain referral methods. Although this leads to biased sampling (such as over- 
sampling participants with many acquaintances), most RDS studies typically measure each 
participant's degree, and under the fundamental RDS assumption (that the probability to 



sample an individual is proportional to his degree) use inverse-probability weighting in an 
attempt to correct for this bias. However, this assumption is tenuous at best, and should 
be avoided. Here we suggest a completely novel approach for inference in RDS which com- 
pensates for such problems by using a rich source of information that is usually ignored 
- the precise timing of recruitment. Our new approach, adapting methods developed for 
inference in epidemic processes, also allows us to develop new estimators for properties such 
as the prevalence of a disease and the total population size, as well as to test the assumption 
of recruitment proportional to degree. We find these estimators asymptotically consistent 
and normally distributed. This new approach thus has the potential to greatly improve the 
utility of data collected using RDS. 

1 Introduction 

Marginalized populations often suffer a disproportionate burden of infectious disease, yet the 
hard-to-reach or hidden nature of these populations makes them difficult to sample, limiting our 
knowledge of the very populations for which surveillance and prevention should be a priority. 
Respondent driven sampling (RDS) is an approach to samphng design that is increasingly widely 
used to study marginalized or highly stigmatized groups (e.g., injection drug users, men who 



have sex with men, sex workers) 9|, llO(]. RDS overcomes the hidden nature of these populations 
by utilizing the networks of social relationships that connect members of the target population 
to facilitate sampling by chain referral methods. "Seeds" are selected by convenience from the 
target population and given coupons. They use these coupons to recruit others, who themselves 
become recruiters. Recruits are given an incentive, usually money, for taking part in the survey 
and also for recruiting others. This process continues in recruitment waves until the survey is 
stopped. Estimation methods are then applied to account for the non-uniformly-random sample 
selection in an attempt to generate unbiased estimates of population composition for the target 
population. Following its introduction [9|, llO| RDS has quickly become popular and relied on 
by major funding bodies, and has been adopted by the WHO (World Health Organization) and 
CDC (Centers for Disease Control and Prevention) for use in HIV surveillance activities. 

As pointed out in |12], "RDS" is a package of two distinct components; namely, a sampling 
method and a method of statistical inference. The sampling method, with notable exceptions. 



has often been found to be efficient and popular, and has led to a wealth of new data 



14|. 



However, the implied assumptions and performance of the second inferential component of the 
RDS "package" are far more vulnerable to criticism. 

The most fundamental problem inference in RDS needs to address is biased sampling (such 
as over-sampling participants with many acquaintances) which might cause, e.g., the sample 
mean to be biased away from the true prevalence of the disease. Ideally, the best way to address 
biased sampling is to stratify the sample into different degree-classes: this is done by measuring 
each participant's degree and estimating the prevalence as 



H = Y,fkPk (1) 



fc>i 
where {fk}k>i is the degree distribution of the population, and pk is the estimator of pk, the 
prevalence within degree class k. However, this is not usually possible because the real degree 
distribution of the population is not known; moreover, when only using the observed degrees 
the degree distribution is, in fact, unidentifiable. Denoting the degree-dependent sampling 
probability by tt^ it is clear that it is possible to estimate only the product TTkfk, not vr^ and 
fk separately. Current RDS studies attempt to rectify this problem by resorting to a set of 
assumptions which model recruitment as a homogenous random walk and culminate with the 
assumption that the sampling probability is proportional to degree, i.e. vr^ ~ k. Based on 
this assu.„p.io„ cu„.entl>. RDS ,.«ea.che.s apply i„ve.e-cie^ee wei,h.i„. a. a substitute for 
inverse-probability weighting (i.e., a Horvitz-Thompson estimator [ll]). 

We believe, however, that the assumption above ("recruitment probability proportional to 
degree"), seemingly necessary to restore identifiability, is highly problematic. In particular, 
it is both very restrictive and unlikely to hold in reality, as well as being untestable. Fortu- 
nately, most RDS studies obtain additional valuable information which is usually discarded - 
not only is the order of recruitment known, but also the precise timing of recruitmentu; thus, 
this information can be used to overcome the above difficulty. Instead of the common naive 
and improvident approach, here we suggest modeling recruitment as a continuous time counting 
process, and utilize the established machinery J|^ applied, for example, in survival analysis in 



a 



stochastic epidemic [3| and software reliability 



15|. 



^Readers interested in size-biased sampling without replacement where the sampling time is not known, and 
only the order of sampling is known, should consult ref. [3, [g] . 



It is worth noting that our approach, discarding the homogenous random walk model in 
favour of a "stochastic epidemic" model, is a very natural one. The recruitment process is 
akin to the spread of an epidemic in a population; hence, why not model it as one? This 
i_s_ particularly promising since it involves linking RDS to a larger, more developed literature 
I, |7|, lla, IJ], with a huge body of previous results. 



In section [2Tt after introducing our new model for RDS, we discuss the related literature 



of epidemiological modeling and inference 
with inference for software reliability 



16| as well as certain related models dealing 



15l . Il2l ]. Then, following some technical preliminaries 
(sec. I2.2.2p . we derive the associated maximum likelihood estimators (MLEs) and discuss their 
asymptotic properties (sections I2.2.2ll?.2.3p . 

Our two main results, theorem [1] and its application theorem [2l demonstrate that the MLE 
for the degree distribution is asymptotically consistent and normally distributed and that sim- 
ilarly our new prevalence estimator is also asymptotically consistent and normally distributed 
(proven in the appendix). 

2 Results 

We begin by introducing our new statistical model; we then derive the associated MLE and 
discuss its properties. This is a generalisation of the model suggested in a previous paper by 
the same authors [5^ which was treated in a non-rigorous manner and studied only through 
simulations and evaluation on RDS data. 

The notations here attempt to maintain compatibility with both epidemiological modeling [^ 
as well as the theory of inference for continuous time counting process [2, ll5| ; minor unavoidable 
clashes, however, are explained below. 

2.1 The new model. Setting and notations 

Our approach for modeling RDS admits the following setting: 

(Ml) The size of the population, N, is not known, although we may assume it is very large. 

(M2) For each degree k there are N^ individuals in the population with degree k. 



(M3) Sampling is done without replacement with n^ j as the (right continuous) counting pro- 
cess depicting the number of people with degree k recruited by time t. 

(M4) Between time t and t + At an individual with degree k is sampled with probability 

Afc,i = ^ItiNk - nk,t)At + o(At) (2) 

where It is the number of people already recruited and actively trying to recruit (invite) 
new individuals, and the constant /3fc is a degree dependent "recruitment rate". 

Using g^ to denotqj the value of g just before t, a more formal statement of (M4) is: 

(M4') The multivariate counting process nt := {ni^t^fi2,t, ■■'ndmax,t) has intensity 

§V(A^i - -m)' |^r(A^2 - n^-,), ..^-^It{N,^.. - n,_,)) (3) 

namely, mt := nt— Jq Xgds is a martingale (and clearly, Aj is predictable, i.e. non-stochastic 
given the past). 

The similarity of eq (3) to the widespread Susceptible-Infected-Removed (SIR) epidemio- 
logical model [3] is quite striking. In the simplest versioqj of the SIR model the susceptible 
set, S, is depleted at a rate, dS, proportional to its size and the size of the infected set, /, i.e. 
dS = —j3ISdt. Thus, in RDS the "inviting" set is analogous to the infectious set in standard 
epidemiology modeling. Similarly, A^ — n is the analog of the susceptible set, 5, in standard 
epidemiology modeling. However, previous epidemiology-related works [la, l3, y] have usually 
focused on the transmission parameters (in our model /3fc's), which are the least relevant to our 
application. As such they also assume knowledge of the degree distribution, which in our case 
is not only unknown but moreover it is actually one of the main objects of interest. 

These features of epidemiological models are complemented by certain models dealing with 
inference in the field of software reliability (iSl. Il2l|. In particular, the Jelinski-Moranda model 



^Hopefully less cumbersome than the alternative common notation, gt-. 

^Most of the more elaborate epidemiological models could be adapted as well for RDS. For example, it is also 
possible to consider the case where a person's probability to recruit new individuals is proportional to his degree. 
In this case we need to replace It in eq (3) with It which is the number of "edges" sampled so far; i.e., if xt is 
the observation at time point t, with a;f = if no one was sampled and xt = k otherwise (fc being the degree 
of the sampled individual) then It = f{xt)dt — It- An even more general "recruitment" is considered in ref. [7[ 
addressing contagion and estimation in multitype epidemics. 



assumes a computer program has an unknown number of "bugs", N, which are detected at a 
rate proportional to the number of remaining (undetected) bugs; i.e., the rate of detecting the 
j^th |-|^g ^g ^. _ ^(jv — (i — 1)). In this case the motivation and approach for estimating N |15l | 
is more akin to RDS; however, two key differences still remain: 

The first complication arises because the Jelinski-Moranda model is a special case of eq (3) 
with If = N, whereas in RDS It is more general and depends on the number of individuals 
detected (see section [2. 2. 1|) . 

Second, the relatively minor complication that RDS is multivariate, unlike the univariate 
Jelinski-Moranda model, is further exacerbated by the fact that often the {A'^fej^'s themselves 
are nuisance parameters required for stratification further down the road. 

Usually (M4') is followed by an examination of the likelihood function at time t given [^, ll| 
by: 



L{f3,N;t) :=exp 



' y'logAfc,t(/3fc,iVfc)dnfc,i - / y2^k,t{f^k,Nk)dt 
,. Jo I, 



(4) 



k "'' k 

however, a few technical details still need to be stipulated (see next section) before carrying out 
an asymptotic analysis. 

2.2 Asymptotic Analysis. 

We begin by specifying a few technical details required for simple analysis (sec. I2.2.1|) : we 
then provide the necessary details from Kurtz's theory of density dependent processes and 
demonstrate convergence of the counting process to a deterministic function (sec. I2.2.2p . The 
notations and results from section 12.2.21 are used in section 12.2.31 to present our main results, 
Theorem [1] and its application Theorem [21 which are proven in the appendix. 

2.2.1 Asymptotic Analysis. Technical preliminaries 

In our model, one of theparameters, N As the number of individuals. As discussed previously 



in similar settings (see 4] p. 430, and 15|) it is obviously not possible to derive any sensible 
large sample result by considering a sequence of models with N fixed. A more relevant large 
sample situations to consider is one in which there are more and more individuals in each 
degree-class within a larger and larger population. We therefore consider a sequence of RDS 



models indexeco by v, and by introducing a dummy variable fk we let vfk denote the size of 
each degree class Nk- Now we consider the estimation of the f^s (and /3fc's) as v ^ oo; the 
result can be later rephrased in terms of Nf^'s, with an analogous consistency and asymptotic 
normality. More precisely, the consistency of fk (or /^ — )• /^ as v — )• oo) implies -^^ — )• 1 as 
Nf: — )• oo and similarly, concerning asymptotic normality 

Mfk-fk) A AA(0,a|^)^^(iV,-JV,) "^ ) AA(0,4^) (5) 

The formulation in ^ makes it clear that the number of different degree classes in the data 
cannot grow too fast (in order to avoid having too few observations from each degree class). 
The simplest and crudest restriction, which we focus on here for simplicity, is one in which the 
maximal degree, dmax-, is bounded by some constant, M, Mv (note that this implies the more 
general and important conditiorO Vi, j : /3jA^j = @{/3jNj) given that Pi is fixed for all i). 

In general, the process It can evolve in an arbitrarily complicated manneio; for example, in 
the SIR model in epidemiology each infected individual gets removed at rate 7 which is also 
of interest. However, since this removal process is both observed fully and is uninteresting to 
us, we will skip modeling it here and treat the rather general case where It = Iq + vg{v^^nt,t) 
where 5 is a non-negative continuous function and Iq is the initial number of seeds used for 
recruitment (see Tl below). 

For simplicity, we tacitly treat the observation period [0, r] with r as a finite number; 
however, a more general approach, which we peruse here in order to simplify other parts of the 
paper, is to allow for an observation period [0, T""] with T'" being stopping times tending to r 
in probability as v increases. In particular, denoting by Nmin '■= ^^ki^k) define: 



T := inf ^ rik^t = Nmin (6) 



Although (l6|), implying prior knowledge of Nmin-, may appear to be a peculiar stopping time 
which could be easily weakened, we chose to keep it in order to avoid otherwise necessary 
distractions from our main point. In particular, this enables us not to specify a particular It 



The sequence of counting processes, n", are the multivariate collections of the (several) univariate processes 
nj! ( with « = 1,2, ... Similarly, the intensities (|3} and the likelihood (Q are indexed mutatis mutandis. 
^We write f{N) = Q(g{N)) if ^ -^ const > 0. 
*As long as it is adapted to the self-exciting history of nt- 



and preserve the very general condition (Tl). 

Finally, if we define the stochastic process Xv{t) by 

x,it) := v-^'l (7) 

in many practical situations, as is shown below, this stochastic process converges uniformly on 
[0,r] in probability to a deterministic function Xoo(t) as f — )• oo. In order to apply Kurtz's 
theorem (Kurtz's law of large numbers) [13i| and obtain this convergence it is customary, for 
example in the study of stochastic epidemics, to have the dynamics (epidemic) initiated by 
a positive fraction of the population. In other words, even though Iq might be a very small 
fraction of the entire population, we still have Iq = 0(iV). 

Summarizing all the technical details of this section, we have: 

(Tl) v^^It = v^'^Iq + g{v^^nt,t) where g' is a non-negative continuous function. 

(T2) ^ -^ const > 

(T3) The maximal degree, dmax-, is bounded by some constant, M, \/v. 

(T4) The observation period [0, r] satisfies: 



\ k 



nht = N„ 



2.2.2 Asymptotic Analysis. Convergence to a deterministic function 

For purposes of notational convenience, let us write momentarily the parameter space (/3, N) as 
<I>. Let K := Z)([0,r]) be the Skorokhod space composed of right-continuous functions on [ 0, r] 



m 



with left limits. The theory developed by Kurtz for the so-called density-dependent process 
deals with processes having an intensity functioqj 

XK^) = vXit,^,v-'n^,) (8) 



^Recall that the superscript v indexes the sequence of processes, each of which evolves in time (subscript t) 
and depends on the parameters "l>. The underlying process can be multivariate, and if we need to emphasize one 
of its components we can go further and write X^ t(4>). 



where X := [0,t] x ^ x K i-^ M+ can be a fairly general function depending on the past of the 
stochastic process up to, but not including, time t. In the multivariate case ([8]) means 

Using our model of RDS and definitions ([3]) and Q we now have for the j^^ component of 
X in RDS 

T Tt 

Xj{t,^,v-'n^,)=f3j^{f,-^) (10) 

which is compatible with ([8]) if, for example, — is a function of v~^n^ (as guaranteed by (Tl-2) 
in the simplest case). 

Two important properties of X as defined in ([8]) and (jlOp are: 

(PI) For all X G -ftT and for all </> G $ the function X satisfies: 



supX{t,(j),x) < oo 

t<T 



(P2) Lipschitz continuity: there exists a constant, L, not depending on t such that for all 
x,y £ K and all t £ [0, r]: 



\X{t,(l),x) -X{t,(l),y)\ < Lsup|x(s) - y{s)\ 

s<t 



This makes it possible to apply Kurtz's law of large numbers and obtain: 

Lemma 1. Let (J)q be the true value of the parameter G <&. The process Xy{t) as defined via 
^ converges uniformly on [0,r] in probability to Xoo(i) as v ^- oo, where Xoo{t) G -D([0,r]) is 
the unique solution of 



x{t) = / X{s,(l)o,x)ds 
Jo 



\13i\ . see for example [4] 



Proof. An immediate result of Kurtz's law of large numbers [13|], see for example [4] theorem 
II.5.4. D 

Remark: note that (Tl) easily provides similar convergence of v~^I^ to some deterministic 
function I°°{t). 

The following properties of X can now also be shown to hold: 



(P3) There exist neighbourhoods $o and Kq of (j)o and Xqo respectively, such that the function 
X{t,(p,x) and its derivatives with respect to (j) of the first, second and third order exist, 
are continuous functions of (j) and x and are bounded on [0, r] x <l>o x Kq. 

(P4) The function X{t,(j),x) is bounded away from zero on [0,r] x $o x Kq. 

(P5) For 1 < i < 2dmax let (/>' denote the parameter /j if 1 < i < d^ax and otherwise denote 
the parameter Pi-d^ax if dmax + 1 < * < 2dma2;. The matrix S = {crjj((/)o)} is positive 
definite, with for 1 < i,j < 2dmax and G ^q: 

Tv^ ^^fc(s,'/',a;oo)a|j^fc(s,(/>,a;oo) , 
7o ^ Xfc(s,(/),j;oo) 

(P3) is trivial and (P4) is an immediate result of (T4), whereas (P5), a long and straightforward 
calculation, will be dealt with in the appendix. 

Finally we have everything at hand to present (and prove) our main results. 

2.2.3 Asymptotic Analysis. Consistency and normality of the MLE and applica- 
tion to prevalence estimation. 

Summarizing previous sectionqfl, in the model (Ml-3,4') it is obviously not possible to derive 
any sensible large sample result by considering a sequence of models with N fixed. We therefore 
consider a sequence of RDS models indexed by v, and introducing a dummy variable fk we let 
vfk denote the size of each degree class Ni^. (see footnote 4, and the discussion leading to it). 
The likelihood function eq. ^ is thus also indexed by v and rewritten as 



L^{P,N;t) :=exp 



/ Y^logvXk{t,^,Xy)dnl^t-v VXfc(t,$, 
JO ^ Jo u 



Xy)dt 



ill) 



with Xk as defined in ([9]) and (fTO|l and Xy as in ([7]). Similarly, define the log- likelihood 
function 

Cy0,N;t) := log Ly (12) 



Our goal here was primarily to allow practitioners interested mostly in applications to skip sections 2.2.1 and 
2.2.2 while keeping the rest of the paper "self contained". Moving the technical sections to an appendix, on the 
other hand, would impede readability of the paper for theoreticians - which accounts for the slight repetition. 



10 



and the minus observed information matrix: 

with (/>* as defined in (P5). 
Our main theorems are: 

Theorem 1. Consider a sequence of RDS counting processes (Ml-3) with intensity function 
(M4') with {N,l3) as parameters. Assume we can index the sequence with u — )• oo obtaining a 
reparameterization {f,/3) with (/o,/3o) ^s the true (unknown) values. If conditions (Tl-4) holds 
then: 

There exists a unique consistent solution {fv,f^v) to the likelihood equations, ■^C-u{^,t) = 0. 
Moreover, this solution provides a local maxiTnum of the likelihood function ill]) and: 

T 



^(ifvJv) - (/o, /3o)) A AA(0, S-i 



where S, given by (P5), can be estimated consistently from the observed information matrix 

Proof. Since the intensity A^(<I>) can be written as fX(t, <^, v^^n^) where X fulfihs conditions 
(Pl-5), this is an immediate result of theorems VI. 1.1 and VI. 1.2 of [4], see for example the less 
general (and similar to our case) theorem 1 of [l5|] (Pl-4 were treated in sec. I2.2.2| and P5 is 
shown in the appendix). D 

As discussed earlier, researchers working on RDS are typically are not interested in the 
degree distribution per se, but rather in the prevalence of, for example, HIV. However, having 
obtained an estimate / of the degree distribution, it is straightforward to stratify and weight 
the observations in order to obtain an estimate, H^ of, e.g., the prevalence of HIV: 

2=1 ' 

where n is the sample size, n^ is the number of individuals in the sample having degree k,Yi = l 
if individual i is HIV infected and Yi = otherwise. It is easy to see that alternatively to ^ 



11 



having an individual-based viewpoint, H could be calculated with a degree-class view in mind: 



k ^ 



H-j:f.^^ ,15, 

where n^ the number of HIV infected individuals in the sample having degree k. 

Denote by N^ the number of HIV infected individuals in the population having degree k. 

It might be safe to assume that the distribution of \/v{:^ j^) is well approximated by 

a normal distribution AA(0, o"? ) independent of everything else (where subscript p^ serves to 
indicate that this is the variance of p^ := ;^-^-, the estimator of p^ := -^ the prevalence within 
degree class k). 
Remark: For example, if ^t-^ is distributed hypergeometrically HG{Nj.,Nj^,nf^^T-;n'^), we can 

even iustifv a2 - i N+ n,~n+ N,-n,,^ 
even ]usuiy ap^ - ^^^ ^^ JV^ N^-i ■ 

Denoting similarly by a^~ as the variance of the estimator of fk (see (P5) and the proof of 

theorem 1) our second main result is: 

Theorem 2. H, the prevalence estimator given by [T4\ ) is asymptotically consistent and dis- 
tributed normally: 

^(^ - ^°) .-^ ^(0, Y^plal + ^ /f <) 



where Hq is the true prevalence within the population. 

Proof. A simple application of the delta method (see appendix). D 

3 Discussion 

It should be emphasised that the new approach presented here and its underlying assumptions 
are genuinely more parsimonious and easier to control and correct than the naive and unsup- 
ported assumptions underlying current approach. Despite additional "noise" and possibly model 
misspecification, a statistical model utilizing more information (such as the temporal statistics 
here) does not merely replace a set of old assumptions with new assumptions; in particular, the 
new assumptions can be tested and improved (applying, e.g., AIC or FIC) whereas the current 
approach has severe unidentifiability concerns. 

Presumably, one major problematic assumption regarding RDS as a sampling method con- 

12 



cerns the securing of a large enough sample from each degree-class. Although it might appear, 
via the requirements stipulated here, that this is a problem unique to our new inference ap- 
proach, it is actually at least as big a concern to the standard approach. Indeed, when applying 
the inverse-degree approach there is a "hidden" stage of estimating the degree distribution, 
resulting with a prevalence estimator similar to (jlSp (although with different f^, of course). 
Thus, through our second theorem (th. 2) we also uncover and quantify this effect for the first 
time. 

Here we addressed only the simplest possible frequentist model. More elaborate models could 
be constructed that account, for example, for homophily through consideration of recruitment 
probabilities depending on the state of both recruiter and recruitee and covariates other than 
degree. Moreover, a Bayesian approach is possible as well: often some prior estimate of A^ is 
available; using a prior for the degree distribution in a fairly straightforward manner might 
alleviate the difficulties due to sparsity of samples from different degree-classes. 

Finally, another advantage of our approach is its ease of use and integration with current 
methods and protocols for sampling in RDS - there is practically no need for design adjustments 
(apart from careful recording of interview times) . The simplified representation of the recruit- 
ment process (e.g., no seeds are introduced during sampling; the possibility of a limited number 
of coupons per recruiter) is not required and was applied here merely to maintain clarity; such 
nuisance process (see sec. 12.2.11 addressing (Tl)) could be easily accounted for. 

4 Appendix 

Theorems 1 and 2 are derived in detail below. 

For theorem 1 we first need to show that (P5) holds: 

The matrix S contains the four blocks (^ ^) with A, for example, depicting the association 
between the different /j's and D depicting the association between the different /3j's. A simple 
calculation shows that the matrix A is a diagonal matrix with entries: 

«ii = / f ' 'oo ds (16) 

JO Ji ~ ^i 



,OD 

''i,s 



where /^ is the deterministic function that v ^I^ converges to, and similarly n^^ is the deter- 
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ministic function that v ^n^ ^ converges to (note we have omitted the "just before" notation, 
g~ , from I and n. This may be done since the integration is with respect to Lebesgue's measure, 
ds). 

Similarly, the matrices B,C,D are also diagonal matrices with entries: 



C'ii — Cji 



Jo 



I?°ds 



(17) 



dr. 



iTih-nTs) 



ds 



(18) 



/o ft 

The invertibility of S can be demonstrated by a direct calculation of its inverse using the fact 
that 



A B 
C D 



{AD - BC)-^ 

{AD - BC)-^ 



I 



D -B 

-C A 



thus we need only show that AD — BC is invertible; i.e., that for all i 



(19) 



ft/° 



fi-r^' 



oo 
i,s 



-ds 



ft 



ds 



I^ds / I^ds / 



(20) 



but for the first term in (1201) we have 



" P^^ 



h-<. 



ds 



irUi - n 



oo ^ 



ft 



ds 



h-n^s 



ds 



irUi-nZ)] ds>{ I I^ds 





with strict inequality after applying the Cauchy-Schwarz inequality for two non-linearly depen- 
dent functions. 

This allows us to apply theorems VI. 1.1 and VI. 1.2 of [2] and establish the "existence" part. 
The following calculations demonstrate the uniqueness of the solution. 

Differentiating the log-likelihood gives 



dC nfc. 



dpk h 



N 



{Nk - n,^)Itdt 



(21) 



and 



dC 



dNk Jo Nk-n^ 



-dn 



k,t 



J^Pkltdt 



(22) 
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These can be written as pairs of equations with two unknowns, and equated to 0, which yield 
Nk, a MLE for A''^, via the (unique) solution to 

y- 1 ^ nk,r Jo hdt 

i=o ^k-i Nk Jq hdt - £ nk,tltdt 

Rearranging ([23]) we get: 

V \\ {Nk-i)[Nk hdt - / nk,thdt - nk,r / hdt JT {Nk - j) = (24) 

and 

f - r r \ "^1 ^, UkrfnhdtNk^. , , 

iVfc / Jtdt - / nfc t/tdt V —^ —^ -^ = 25 

V A Jo ' J Po {Nk-nuMNk-j) {Nk-nu^r)\ 

Dividing by --^ — - — r? and substituting Y^.-^'n 1 for nu ^ we get 

( rr f-T \ "fe.-^l 1 "*.--! /.T 

iVfc / lidt - / nk,thdt V --r - y2 hdt = (26) 



]■ 



or 






and finally 



"■fe.T^l 



El 3 lo hdt - £ nk,thdt J 
^ . ^ = (28) 

(Nk-j) 

3=0 

Let A'^^ be a solution to eq. ()28p and assume in contradiction eq. (|28p has another solution, 
-^* ^ -^fc ' ^'^ ^'^^ range N^ > ?^A;,T• Notice the tail of the sum in eq. (pHj) is comprised of positive 
terms (and perhaps an irrelevant zero term) which are required to cancel out the negative terms 
comprising the beginning of the sum; write this as Head{Nk) + Tail{Nk) = 0, emphasizing the 
functional relationship of (|28p and N^- Let j* be the index of the first positive term in the sum 
()28p . in other words: Head{Nk) is comprised of j* negative terms (ignoring the possibility of 
an irrelevant zero term). 

Since N^ is a solution to eq. ([25]) we have Head{Nl) + Tail{Nl) = 0. However, Tail{N^) > 
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■^^Tail{Nl) while \Head{Nl)\ < ^'iJ.,ll \Head{N*)\. We finish by concluding Head{N*) + 
Tail{N') / 0, i.e. a contradiction. 

D 
For theorem 2 we apply the delta method (assuming y/v{pk — Pk) converges to A/'(0, o-| ), and 
since from theorem 1 we have \/v{fk — fk) — > -^(0, a'^~ ); see also eq. (|16til9p ). 

?^->oo Jk 

Define the concatenated random vector 4> := {f,pf = (fi, h, ■■■fdmax^Pi^P'2---Pd^aJ^ with 
a covariance matrix C := diagia'^^ ,0"^ , ...o"^- ,0"? ,0"? , ...o"? |. Rewrite H as the function 

Hf,P) = ^fkPk 

k 



with Vh{f,p) = ipi,P2---Pdmax,fi, /2, --fd^axV ■ F^om the delta method we have 

V^(/i(0) - /i(</.o)) A AA(O,V/i(</.ofCV/i(0o)) 

where (pQ contains the true values of the associated estimated parameters. Since h[(f)Q) = Hq 
and Vh{(j)QfCVh{(t)Q) = J^kPk'^'i + J2k fl'^l theorem 2 is now established. D 

Jk "^ 



16 



References 

[1] Odd Aalen. Nonparametric inference for a family of counting processes. The Annals of 
Statistics, 6(4):pp. 701-726, 1978. 

[2] Per K. Andersen, Ornulf Borgan, Richard D. Gill, and Niels Keiding. Statistical Models 
Based on Counting Processes (Springer Series in Statistics). Springer, corrected edition, 
June 1995. 

[3] Hakan Andersson and Tom Britton. Stochastic Epidemic Models and Their Statistical 
Analysis. Lecture Notes in Statistics. Springer, first edition, July 2000. 

[4] Niels G. Becker. Analysis of infectious disease data. Chapman and Hall, London ; New 
York, 1989. 

[5] Yakir Berchenko, Jonathan Rosenblatt, Richard G. White, and Simon D.W. Frost. Re- 
spondent driven sampling as an epidemic process. Submitted. 

[6] Peter J. Bickel, Vijayan N. Nair, and Paul C. C. Wang. Nonparametric inference under 
biased sampling from a finite population. The Annals of Statistics, 20(2):pp. 853-878, 1992. 

[7] T. Britton. Estimation in multitype epidemics. Journal of the Royal Statistical Society: 
Series B (Statistical Methodology), 60(4):663-679, 1998. 

[8] L. Gordon. Estimation for large successive samples with unknown inclusion probabilities. 
Advances in Applied Mathematics, 14(1):89 - 122, 1993. 

[9] DD Heckathorn. Respondent-driven sampling: a new approach to the study of hidden 
populations. Soc. Probl., 44(2): 174-199, 1997. 

[10] DD Heckathorn. Respondent-driven sampling II: deriving valid population estimates from 
chain-referral samples of hidden populations. Soc. Probl., 49(l):ll-34, 2002. 

[11] D G Horvitz and D J Thompson. A generalization of sampling without replacement from 
a finite universe. Journal of American Statistical Association, 47(260):663- 685, 2010. 

[12] Z. Jelinski and P. Moranda. statist, comp. perform. Eval., pages 465-484, 1972. 



17 



[13] T. G. Kurtz. Gaussian approximations for markov chains and counting processes. Bull. 
Internat. Statist. Instit., pages 361-376, 1983. 

[14] Mohsen Malekinejad, Lisa Grazina Johnston, Carl Kendah, Ligia Regina Franco Sansigolo 
Kerr, Marina Raven Rifkin, and George W Rutherford. Using respondent-driven samphng 
methodology for HIV biological and behavioral surveillance in international settings: a 
systematic review. AIDS Behav, 12(4 Suppl):S105-S130, Jul 2008. 

[15] Mark C. van Pul. Asymptotic properties of a class of statistical models in software relia- 
bility. Scandinavian Journal of Statistics, 19(3) :pp. 235-253, 1992. 

[16] Wasima N Rida. Asymptotic properties of some estimators for the infection rate in the 
general stochastic epidemic model. J Roy Statist Soc Ser B, 53(l):269-283, 1991. 

[17] Matthew J. Salganik. Commentary: Respondent-driven sampling in the real world. Epi- 
demiology, 23(1): 148-150, 2012. 



18 



